Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow serial_diff under min_doc_count aggs #86401

Merged
merged 5 commits into from
May 19, 2022

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented May 3, 2022

Before 6.6 we allowed the serial_diff agg in lots of places, including
under date_histogram aggregations with min_doc_count: 1. This
allowed you to take the difference of two adjacent buckets, skipping any
that don't have any value. So if you have a value at 10am, no value at
11am, and another at noon the serial_diff will diff the 10am and noon
values. In 6.6 we disabled support for this. We'd like it back.

@nik9000 nik9000 requested a review from not-napoleon May 3, 2022 17:16
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

nik9000 added 2 commits May 3, 2022 13:23
Before 6.6 we allowed the `serial_diff` agg in lots of places, including
under `date_histogram` aggregations with `min_doc_count: 1`. This
allowed you to take the difference of two adjacent buckets, skipping any
that don't have any value. So if you have a value at 10am, no value at
11am, and another at noon the `serial_diff` will diff the 10am and noon
values. In 6.6 we disabled support for this. We'd like it back.
@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label May 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

@nik9000
Copy link
Member Author

nik9000 commented May 3, 2022

I took the opportunity to remove an annoying TODO with a bunch of instanceofs and add some yaml tests for serial_diff.

@arteam arteam added v7.17.5 and removed v7.17.4 labels May 17, 2022
Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -216,6 +209,11 @@ public void addBucketPathValidationError(String error) {
*/
public abstract void validateParentAggSequentiallyOrdered(String type, String name);

/**
* Validates that the parent is sequentially ordered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a method right above that has the same javadoc. I know these two methods don't do the same thing, so they should explain how they're different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I meant to do this but skipped it. thanks.

@@ -134,6 +131,11 @@ public void validateParentAggSequentiallyOrdered(String type, String name) {
+ "] must have a histogram, date_histogram or auto_date_histogram as parent but doesn't have a parent"
);
}

@Override
public void validateParentAggSequentiallyOrderedWithoutSkips(String type, String name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs javadoc. It's subtly different from the version in ForInsideTree, and it's not clear why they do two different things without loading the whole context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always throws an exception. I'll jiggle it so it's more clear.

@Override
protected void validateSequentiallyOrderedWithoutGaps(String type, String name, Consumer<String> addValidationError) {
if (minDocCount != 0) {
addValidationError.accept("parent histogram of " + type + " aggregation [" + name + "] must have min_doc_count of 0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to think about extended/hard bounds here? I'm pretty sure the answer is "no", but I want to say it out loud.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never have in the past. I think it's ok because those'll just make the result wider without adding any gaps.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from an API perspective. Thanks for adding test cases!

@nik9000 nik9000 added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending labels May 18, 2022
@nik9000
Copy link
Member Author

nik9000 commented May 19, 2022

run elasticsearch-ci/part-2

@elasticsearchmachine elasticsearchmachine merged commit 1ff0ce4 into elastic:master May 19, 2022
@nik9000 nik9000 deleted the serial_diff_gaps_ok branch May 19, 2022 18:02
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 19, 2022
Before 6.6 we allowed the `serial_diff` agg in lots of places, including
under `date_histogram` aggregations with `min_doc_count: 1`. This
allowed you to take the difference of two adjacent buckets, skipping any
that don't have any value. So if you have a value at 10am, no value at
11am, and another at noon the `serial_diff` will diff the 10am and noon
values. In 6.6 we disabled support for this. We'd like it back.
nik9000 added a commit that referenced this pull request May 20, 2022
Before 6.6 we allowed the `serial_diff` agg in lots of places, including
under `date_histogram` aggregations with `min_doc_count: 1`. This
allowed you to take the difference of two adjacent buckets, skipping any
that don't have any value. So if you have a value at 10am, no value at
11am, and another at noon the `serial_diff` will diff the 10am and noon
values. In 6.6 we disabled support for this. We'd like it back.
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request May 31, 2022
My change elastic#86401 never made it into 7.17.4 so the test was confused.
This bumps the version. It'll make it into 7.17.5.
elasticsearchmachine pushed a commit that referenced this pull request May 31, 2022
My change #86401 never made it into 7.17.4 so the test was confused.
This bumps the version. It'll make it into 7.17.5.

Closes #87113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Clients Meta label for clients team v7.17.5 v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants